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HARDWARE DEVICE FOR PROCESSING THE TASKS OF AN ALGORITHM IN 

PARALLEL 



Technical field 

5 The invention relates to processing of algorithms used in the search engines of a large data 
communication network such as the Internet, and relates more particularly to hardware devices for 
processing the tasks of any algorithm in parallel. 

Bach^ound 

C The World Wide Web (WWW) provides accesses to a large body of information. Compared with 
;i;0 traditional databases, Web information is dynamic and structured with hyperlinks. Also, it can be 
S represented in different forms and is globally shared over multiple sites and platforms. Hence, 
E querying over the WWW is significantly different from querying data from traditional databases, e.g. 
relational databases, which are structured, centralized and static. Traditional data bases can cope with 
a small number of information sources; but it is ineffective for thousands. 

15 Most Web documents are text-oriented. Most relevant information is usually embedded in the text 
and can not be explicitly or easily specified in a user query. To facilitate Web searching, many search 
engines and similar programs have been developed. Most of these programs are database based 
meaning that the system maintains a database, a user searches the web by specifying a set of keywords 
and formulating a query to the database. Web search aids are variously referred to as catalogs, 

20 directories, indexes, search engines, or Web databases. 
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A search engine is a Web site on the Internet which someone may use to find desired Web pages and 
sites. A search engine will generally return the results of a search ranked by relevancy. 

A competent Web search engine must include the fundamental search facilities that Internet users are 

familiar with, which include Boolean logic, phrase searching, truncation, and limiting facilities (e.g. 

limit by field). Most of the services try more or less to index the full-text of the original documents, 

which allows the user to find quite specialized information. Most services use best match retrieval 

systems, some use a Boolean system only. 

Web search engines execute algorithms having intemal processes which are repetitive tasks with 
independent entry data. A classical step by step processing of all processes and decisions on one entry 
data before processing the next entry data is inefficient since it takes too much time to process all the 
data. Thus, it is common to perform a search of a pattem within each file of a disk. The main 
repetitive processes to perform are : load file, open file, scan each word and compare for matching 
with a pattem, append the result in a temporary file, close file. 

One way to improve the performance, and in particular to improve the search response time, is to 
achieve parallel processing by parallelizing the search mechanism in the database or index table. Such 
software parallelization will be more optimized but is nevertheless limited insofar as the software 
processing, even if parallelized, requires a minimum of time which cannot be reduced. 

Summary of the mvention 

Accordingly, the object of the invention is to provide a hardware assist device able to run a set of 
repetitive processes using local pipelining for each task, and maintaining a relationship between the 
parent task and the child task for each occurrence in the pipeline. 
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Another object of the invention is to provide a hardware device for processing the tasks of a search 
algorithm in parallel wherein each specific task of the search is made by a dedicated processor. 

The invention relates therefore to a hardwm-e device for processing the tasks of an algorithm of the 
type comprising a number of processes the execution of some of which depend on binary decisions, 
5 the device comprising a plurality of task units which are each associated with a task defined as being 
either one process or one decision or one process together with the following decision, and a task 
interconnection logic block connected to each task unit for communicating actions from a source task 
unit to a destination task unit, each task unit including a processor for processing the steps of the 
associated task when the received action requests such a processing and a status manager for handling 
fp the actions coming from other task units and building the actions to be sent to other task units 

Brief description of the drawings 

The above and other objects, features and advantages of the invention will be better understood by 
C: reading the following more particular description of the invention in conjunction with the 
] accompanying drawings wherein : 

r|5 Fig. 1 represents an exemplary algorithm composed of three processes and three decisions. 

Fig. 2 represents the algorithm illustrated in Fig. 1 which has been structured into several tasks 
to be executed by the hardware device according to the invention. 

Fig. 3 is a block-diagram representing the hardware device according to the invention. 

Fig. 4 is a representation of the configuration register used to control each task executed by the 
20 hardware device of Fig. 3 . 

Fig. 5 A and 5B are tables representing respectively the actions to be executed by each task of 
the algorithm illustrated in Fig. 1 in fimction of the possible activation sources for an instance and the 
following instance. 
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Fig. 6 is a block-diagram representing the connection between the task interconnection logic 
block of the hardware device of Fig. 3 and the different tasks of the algorithm. 

Detailed description of the inyention 

The exemplary algorithm illustrated in Fig. 1 includes three processes Pj, and P3 and two decisions 
5 Di and D2* Depending on each decision, different functions corresponding to the different paths in 

the algorithm may be run. The first function is represented by the algorithm flow when decision Dj 
-1 is "yQs'\ that is when processes Pj and P2 are to be executed. The second function is represented by 
H the algorithm flow when decision is "no" and decision D2 is "yes", that is when processes Pj and 
O P3 are to be executed. Finally, the third function is represented by the algorithm flow when decision 
§0 Di is "no" and decision is also "no", that is when only process Pj is to be executed, hi the latter 

case, the algorithm flow loops back to the entry point and the same functions may be executed again. 
^ Thus, during the first algorithm flow, process Pi is started while the execution of process P, is started 

m again when decision D2 is "no". The second execution of P^ starts only after the first execution of Pj 

has been completed and decision Dj and D2 have been completed. Therefore, there is no overlap 
WS possible in a simple step by step processing of the algorithm. 

Though the algorithm represented in Fig. 1 is very simple, all the algorithms are classically run in the 
same way. All the events (processes or decisions) of the algorithm flow have to be executed step by 
step although they are run repetitively with new entry data. The proposed invention allows the 
various processes and decisions to run separately in order to speed up the processing of the algorithm 
20 especially when there is no prior data required on some steps. The main idea to achieve this is to have 
one processor assigned to a task including a process, a decision or a combination of processes and 
decisions which will run all the repetitive mstances of this task and will be linked to the execution 
result of the other task processors using a more detailed link information that the simple conventional 
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link enabling the downstream tasks to be activated. 

Using the principles of the invention, the algorithm of Fig. 1 can be divided into tasks as illustrated 
in Fig. 2. Four tasks are thus implemented. 

Task 1 (Tj) includes process P2 (no decision) 
Task 2 (T2) includes process P3 (no decision) 

Task 3 (T3) includes the sequential combination of process and decision Dj 
Task 4 (T4) includes only decision (no process) 

According to the invention, each task is repetitively performed by one processor allocated to this task. 
Therefore, four processors will be required to run the example algorithm of Fig. 1 and Fig. 2. 

The hardware device according to the invention illustrated in Fig. 3 comprises as many task units 10, 
12, 14 as the number of tasks included in the algorithm (Task j, Task2 .... Task„). The interconnection 
between the tasks is performed by the intermediary of a Task interconnection logic block 16 as 
explained hereafter. 

Each task unit like task unit 10 includes a processor 18 in charge of processing the sequential steps 
of the process, the decision or the combination of the process and the decision generally incorporated 
in the corresponding task. Actions received from other task units or sent to other task units by means 
of Task interconnection logic block 16 are managed by status manager 20 which is preferably a state 
machine. Status manager 20 is connected to processor 1 8 by two Unes, an input line to processor 18 
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for starting (S) the task execution and the output line from the processor which is activated when the 
task is completed (C). 

Status manager 20 has essentially two functions (input and output). The input fimction handles 
incoming commands from other tasks and the output function builds commands to be sent to other 
5 tasks. To perform these functions in conjunction with processor 1 8, several control/ data registers 22, 
24, 26 are used. Each control/data register corresponds, for this task, to an instance of the algorithm 
flow. The number of instances which can be run at the same time depends upon the pipeline capability 
of processor 18. Generally, it is necessary to have three control/data registers corresponding to 
instances m, m+l , m+2. 

% Each control/data register 22, 24 or 26 contains a control field and a data field. The control field is 
CI composed of three bits controlled by processor 1 8, a validation bit V, a completion bit C and a bit L/R 
indicating whether the output is Left of Right when the task includes a decision. 

Pr The data field of a control/data register contains data which are loaded by status manager 20 after 
receiving an action to be performed from another task and before starting the task execution by 

1 5 sending the start command to task processor 1 8. These data may be used by processor 1 8, When the 
latter has completed the task execution, it may replace the data contained in the control/data register 
by other data. This data will then be sent to the destination task in the command word and used as an 
input field by the destination task processor. However, it must be noted that, in case of independent 
tasks, the data are not modified in the control/data register. 
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When the task execution has been completed by processor 18, this one sets to 1 the bit C of the 
control field of the control/data register and a signal C may be sent to status manager 20, Therefore, 
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either status manager is activated by the input signal C from task processor 18, or there is a polling 
or an interrupt mechanism which enables the status manager to be informed of the setting of bit C to 
1. 

The commands which may be received from another task by status manager 20 are START, KILL or 
5 V ALID. As already mentioned, the START command is used to activate task processor 1 8. The KILL 

command means that a task is no longer of interest since the taken decision is opposite to this task. 

Thus, a task which is the left path of a decision rtiay be killed if the decision is to take the right path. 

When it receives a KILL command, status manager 20 clears the control/data register corresponding 

to the instance being considered as each command has as a parameter the instance value called level. 
Ip Conversely to the KILL command, the VALID command confirms that the considered task 
0 - corresponds to the taken decision. In such a case, the bit V of the corresponding control/data register 
J;: is set to 1 by status manager 20. 

% The output function of status manager 20 is to build commands based on the contents of two 
C configuration registers, CONFIG.L 28 and CONFIG.R 30 and also on the contents of the involved 
fl^5 control/data register. The contents of CONFIG.L register which is selected when bit L/R set to 1 are 
given in Fig. 4. Note that the CONFIG.R register which is selected when bit L/R is set to 0 has exactly 
the same structure as CONFIG.L register. Note that the CONFIG.L and CONFIG.R registers are 
loaded at the beginning of algorithm processing and remain unchanged insofar as they contain data 
fields depending only on the algorithm structure. 

20 As illustrated in Fig. 4, CONFIG.L register contains a first block C selected when bit C is set to 1 and 
a second block V selected when bit V is set to 1 . Each block C or V is used for two actions. For each 
action the register contains the three following fields wherein X = C or V and n = 1 or 2. 
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Task Xn indicates which task should be activated 

Axn indicates which action is to be performed. For example 00 = kill, 01 = start, 10 = 
valid and 1 1 = valid + start. 

Lxn indicates the level of task (the instance) corresponding to Task Xn. For example, 
5 00 = current level -1,01 = current level, 1 0 = current level +1,11= current level + 2. 



The example of the algorithm ilkstrated in Fig. 2 will be considered below, hi Figure 2 there are four 
?: tasks T„ Tj, T3 and T4 which can be executed, but there are six activation sources since Task 3 and 
y ; Task 4 each have two outputs. Furthermore, a task actmg as a source task can activate a destination 
% task in the same level or in the following level. Fig. 5A and Fig. 5B represent tables wherein the 
^ activation sources are associated with the columns whereas the tasks to be activated are associated 
with the rows. Fig. 5A corresponds to the activation of the tasks in a same level whereas Fig. 5B 
[•? : corresponds to the activation of the tasks in level m+1 by activation sources in level m. It should be 
& noted that since only two levels are represented, this means that there is no relationship between the 
1 5 processes of the algorithm on more than two consecutive levels. 

In the tables illustrated in Fig. 5A and 5B, only the cases corresponding to an action from an 
activation source to a task are filled with a letter. Letter S means Start, V means Validate and K 
means Kill. It must be noted that it is possible that a same source has an action on two tasks. Thus, 
T3R kills Task 1, and starts and validates Task T4. 
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As akeady mentioned, status manager 20 (Fig. 2) uses the control bits which have been previously 
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loaded in CONFIG.L and CONFIG.R registers associated with the task. Thus, if we consider Task3 
which generates two activation soxirces, the CONFIG.L and CONFIG.R registers have the following 
contents : 

CONFIG.L 

1. Block C 

Action 1 Task C, = Task 3 

ACi = start 

LC, = current level + 1 
Action 2 none 

Block V 

Action 1 Task Vi = Task 1 
AV, = valid 
LVi = current level 
Action 2 none 

CONFIG.R 

1. Block C 

Action! Task C, = Task 3 
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ACj = start 
LCj = current level + 1 
Action! none 

2. Block V 
5 Action 1 Task = Task 1 

AY, =kill 
'ji LVi = current level 

if; Action 2 Task = Task 4 

AV2 = valid + start 
rlO LV2 = current level 

% The Task interconnection logic block 16 is represented in Fig. 6. Each task such as Taskl, Task2, 
Tasks, ... Task n is an input to Task interconnection logic block 16 but is also an output to this block. 
Each input action or command could be of the same type as each one of the output actions such as 
KILL, START or VALID. Using the CONFIG.L and CONFIG.R registers where an action is 
1 5 represented by three control fields Task Xn, Axn and Lxn, an action word may use this control fields 
in addition to the corresponding data (see Fig.4) to transmit the action to the destination task. 

In the preferred embodiment illustrated in Fig. 6, the action word containing the control bits of 
CONFIG.L or CONFIG.R registers and data is input to a three-state driver 40, 42, 44 or 46 where the 
Task Xn field is decoded in order to select on which bus this action word should be put. This word, 
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or the remaining bits insofar as the Task Xn field is no longer used, are then decoded by the 
appropriate task to perform the requesting action. 

As illustrated in Fig. 6, there are as many buses as the number of tasks. These buses are three-state 
so that all inactive inputs have no influence in the bus value. Only the valid one forced by the 
corresponding driver takes the bus for its command. The width of the bus depends on the size of the 
action word. In the preferred embodiment the bus size is equal to word size. If there is a problem in 
the size of the bus, it is well known how to split the word into several blocks appended when sent on 
a smaller bus. The only drawback of this split v^U be an increased transmission latency as it will need 
several clock times to transmit a command or action from one output task to an input task. At least, 
the TaskXn should be available in the first block of the split word to be decoded correctly. 

Each task can then put all the actions on the various buses. As long as there is no capability to have 
an action simultaneously put on the same bus by two tasks, there is no arbitration required. This is 
the case for most of the algorithms. Otherwise, an arbitration mechanism may be added on the control 
of each three-state driver to identify two simultaneous requests for the same destination. A very 
simple contention mechanism will for example give the priority on the destination bus to the lower 
source task. 
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CLAIMS 



Claim 1: 



1 A hardware device for concurrently processing a plurality of tasks associated with an 

2 algorithm which includes a number of processes some of which are dependant on binary decisions, 

3 said device comprising: 

4 a plurality of task units (10, 12 14) for processing data, making decisions and/or processing 

5 data and making decisions; 

□ 6 a task interconnection logic means (16) interconnecting the task imits for communicating 

^ 7 actions from a source task unit to a destination task unit. 

fj) 8 each of said task units including a processor (18) for executing the steps of the associated task 

''[■I 9 in response to a received request action; and, 

; 10 a status manager (20) for handling actions from source task units and building actions to be 

ffl 1 sent to destination task units. 

C Claim 2. 

1 Hardware device according to claim 1 , wherein said actions communicated from a source task 

2 unit to a destination task unit are START used to activate the processor ( 1 8) of said destination task 

3 unit, KILL used to cancel the task associated with said destination task unit and VALID used to 

4 confirm that task associated with said destination task unit corresponds to a decision included in said 

5 task. 



Claim 3. 
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1 Hardware device according to claim 2, wherein said status manager (20) activates said 

2 processor ( 1 8) for processing the steps of the task associated with said destination task unit when the 

3 action received from a source task unit is START. 

Claim 4. 

Hardware device according to claim 3, wherein said status manager (20) is a state machme. 
Claim 5. 

1 Hardware device according to claim 3, wherein each of said task units (10, 12, 14) further 

2 comprises a plurality of control/data registers (22, 24, 26) each corresponding, for the task associated 

3 with said task unit, to an instance of the algorithm flow, each one of said control/data registers 

4 comprising a control field composed of a completion bit(C) set to 1 when the associated task is 

5 completed, a validation bit (V) set to 1 when the associated task is validated and a L/R bit indicating 

6 that the output in the algorithm flow is left or right when said task includes a decision. 

Claim 6. 

1 Hardware device according to claim 5, wherein each of said control/data registers (22, 24, 26) 

2 includes a data field which is loaded if necessary by said status manager (20) activated by an action 

3 received fi-om a source task unit, said processor ( 1 8) using these data for executing the task associated 

4 with said task unit and replacing them if necessary. 

Claim 7. 

1 Hardware device according to claim 6, wherein said completion bit (C) is sent by said 
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processor (1 8) to said status manager (20) after completion of the task execution. 



Claim 8. 

1 Hardware device according to claim 5, 6 or 7, wherein said control/data register (22, 24 or 26) 

2 corresponding to a specific instance is cleared by said status manager (20) when this one receives an 

3 action KILL for the task associated with said task unit and for said specific instance. 

Claim 9. 

1 Hardware device according to any one of claims 5 to 7, wherein each one of said task units 

2 (10, 12, 14)furthercomprisestwoconfigurationregisters CONFIGX(28)andCONFiaR(30) which 

3 are respectively selected by the binary value of said bit L/R of the control/data register (22, 24, 26) 

4 of the instance being considered, the contents of said configuration registers being loaded at the 

5 beginning of the algorithm processing for defining the task to be activated, the action to be performed 

6 and the instance to be considered . 

Claim 10. 

1 Hardware device according to any one of claims 1 to 7, wherein said task interconnection logic 

2 block (16) is composed of three-state drivers (40, 42, 44) each one of said drivers being associated 

3 with one of said tasks as input task and a number of buses equal to the number of said tasks as output 

4 tasks, one of said buses being selected by the driver corresponding to an input task after decoding an 

5 action word by said driver. 
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HARDWARE DEVICE FOR PROCESSING THE TASKS OF AN ALGORITHM IN 

PARALLEL 



Abstract 

A hardware device for processing the tasks of an algorithm of the type having a number of processes 
the execution of some of which depend on binary decisions has a plurality of task units (10, 12, 14), 
each of which are associated with a task defined as being either one process or one decision or one 
process together v^th a foUov^ng decision. A task interconnection logic block (16) is connected to 
each task unit for communicating actions fi:om a source task unit to a destination task unit. Each task 
unit includes a processor (18) for processing the steps of the associated task when a received action 
requests such a processing. A status manager (20) handles actions coming fi-om other task units and 
builds actions to be sent to other task units 
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